AITopics

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > Alameda County > Livermore (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Information Technology (0.67)
Leisure & Entertainment > Games > Computer Games (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.45)

Neural Information Processing SystemsFeb-10-2026, 09:00:26 GMT

ca3a9be77f7e88708afb20c8cdf44b60-AuthorFeedback.pdf

agent, off-policy agent, on-policy agent, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.73)

arXiv.org Artificial IntelligenceNov-12-2025

Policy Transfer for Continuous-Time Reinforcement Learning: A (Rough) Differential Equation Approach

Guo, Xin, Lyu, Zijiu

This paper studies policy transfer, one of the well-known transfer learning techniques adopted in large language models, for two classes of continuous-time reinforcement learning problems. In the first class of continuous-time linear-quadratic systems with Shannon's entropy regularization (a.k.a. LQRs), we fully exploit the Gaussian structure of their optimal policy and the stability of their associated Riccati equations. In the second class where the system has possibly non-linear and bounded dynamics, the key technical component is the stability of diffusion SDEs which is established by invoking the rough path theory. Our work provides the first theoretical proof of policy transfer for continuous-time RL: an optimal policy learned for one RL problem can be used to initialize the search for a near-optimal policy in a closely related RL problem, while maintaining the convergence rate of the original algorithm. To illustrate the benefit of policy transfer for RL, we propose a novel policy learning algorithm for continuous-time LQRs, which achieves global linear convergence and local super-linear convergence. As a byproduct of our analysis, we derive the stability of a concrete class of continuous-time score-based diffusion models via their connection with LQRs.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2510.15165

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Education (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

arXiv.org Artificial IntelligenceNov-4-2025

A Generalized Bisimulation Metric of State Similarity between Markov Decision Processes: From Theoretical Propositions to Applications

Tao, Zhenyu, Xu, Wei, You, Xiaohu

The bisimulation metric (BSM) is a powerful tool for computing state similarities within a Markov decision process (MDP), revealing that states closer in BSM have more similar optimal value functions. While BSM has been successfully utilized in reinforcement learning (RL) for tasks like state representation learning and policy exploration, its application to multiple-MDP scenarios, such as policy transfer, remains challenging. Prior work has attempted to generalize BSM to pairs of MDPs, but a lack of rigorous analysis of its mathematical properties has limited further theoretical progress. In this work, we formally establish a generalized bisimulation metric (GBSM) between pairs of MDPs, which is rigorously proven with the three fundamental properties: GBSM symmetry, inter-MDP triangle inequality, and the distance bound on identical state spaces. Leveraging these properties, we theoretically analyse policy transfer, state aggregation, and sampling-based estimation in MDPs, obtaining explicit bounds that are strictly tighter than those derived from the standard BSM. Additionally, GBSM provides a closed-form sample complexity for estimation, improving upon existing asymptotic results based on BSM. Numerical results validate our theoretical findings and demonstrate the effectiveness of GBSM in multi-MDP scenarios.

artificial intelligence, inequality, machine learning, (14 more...)

2509.18714

Country: North America > United States (1.00)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.84)

Neural Information Processing SystemsOct-10-2025, 09:29:12 GMT

Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL Andrew Wagenmaker

exploration, probability, sim, (16 more...)

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > Alameda County > Livermore (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Information Technology (0.67)
Leisure & Entertainment > Games > Computer Games (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.45)

Neural Information Processing SystemsOct-2-2025, 21:44:04 GMT

Multi-View Reinforcement Learning

Minne Li, Lisheng Wu, Jun WANG, Haitham Bou Ammar

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Country: Europe > United Kingdom (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceSep-24-2025

MV-UMI: A Scalable Multi-View Interface for Cross-Embodiment Learning

Rayyan, Omar, Abanes, John, Hafez, Mahmoud, Tzes, Anthony, Abu-Dakka, Fares

Recent advances in imitation learning have shown great promise for developing robust robot manipulation policies from demonstrations. However, this promise is contingent on the availability of diverse, high-quality datasets, which are not only challenging and costly to collect but are often constrained to a specific robot embodiment. Portable handheld grippers have recently emerged as intuitive and scalable alternatives to traditional robotic teleoperation methods for data collection. However, their reliance solely on first-person view wrist-mounted cameras often creates limitations in capturing sufficient scene contexts. In this paper, we present MV-UMI (Multi-View Universal Manipulation Interface), a framework that integrates a third-person perspective with the egocentric camera to overcome this limitation. This integration mitigates domain shifts between human demonstration and robot deployment, preserving the cross-embodiment advantages of handheld data-collection devices. Our experimental results, including an ablation study, demonstrate that our MV-UMI framework improves performance in sub-tasks requiring broad scene understanding by approximately 47% across 3 tasks, confirming the effectiveness of our approach in expanding the range of feasible manipulation tasks that can be learned using handheld gripper systems, without compromising the cross-embodiment advantages inherent to such systems.

artificial intelligence, deployment, robot, (16 more...)

2509.18757

Country:

Europe > Switzerland (0.28)
Europe > Austria (0.28)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (0.48)

Neural Information Processing SystemsAug-16-2025, 11:26:58 GMT

ca3a9be77f7e88708afb20c8cdf44b60-AuthorFeedback.pdf

agent, off-policy agent, on-policy agent, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.73)

arXiv.org Artificial IntelligenceJul-31-2025

Successor Features for Transfer in Alternating Markov Games

Amatya, Sunny, Ren, Yi, Xu, Zhe, Zhang, Wenlong

-- This paper explores successor features for knowledge transfer in zero-sum, complete-information, and turn-based games. Prior research in single-agent systems has shown that successor features can provide a "jump start" for agents when facing new tasks with varying reward structures. However, knowledge transfer in games typically relies on value and equilibrium transfers, which heavily depends on the similarity between tasks. This reliance can lead to failures when the tasks differ significantly. T o address this issue, this paper presents an application of successor features to games and presents a novel algorithm called Game Generalized Policy Improvement (GGPI), designed to address Markov games in multi-agent reinforcement learning. The proposed algorithm enables the transfer of learning values and policies across games. An upper bound of the errors for transfer is derived as a function the similarity of the task. Through experiments with a turn-based pursuer-evader game, we demonstrate that the GGPI algorithm can generate high-reward interactions and one-shot policy transfer . When further tested in a wider set of initial conditions, the GGPI algorithm achieves higher success rates with improved path efficiency compared to those of the baseline algorithms.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2507.22278

Country: North America > United States > Arizona (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceMar-27-2025

Learning Generalizable Skills from Offline Multi-Task Data for Multi-Agent Cooperation

Liu, Sicong, Shu, Yang, Guo, Chenjuan, Yang, Bin

Learning cooperative multi-agent policy from offline multi-task data that can generalize to unseen tasks with varying numbers of agents and targets is an attractive problem in many scenarios. Although aggregating general behavior patterns among multiple tasks as skills to improve policy transfer is a promising approach, two primary challenges hinder the further advancement of skill learning in offline multi-task MARL. Firstly, extracting general cooperative behaviors from various action sequences as common skills lacks bringing cooperative temporal knowledge into them. Secondly, existing works only involve common skills and can not adaptively choose independent knowledge as task-specific skills in each task for fine-grained action execution. To tackle these challenges, we propose Hierarchical and Separate Skill Discovery (HiSSD), a novel approach for generalizable offline multi-task MARL through skill learning. HiSSD leverages a hierarchical framework that jointly learns common and task-specific skills. The common skills learn cooperative temporal knowledge and enable in-sample exploitation for offline multi-task MARL. The task-specific skills represent the priors of each task and achieve a task-guided fine-grained action execution. To verify the advancement of our method, we conduct experiments on multi-agent MuJoCo and SMAC benchmarks. After training the policy using HiSSD on offline multi-task data, the empirical results show that HiSSD assigns effective cooperative behaviors and obtains superior performance in unseen tasks.

artificial intelligence, machine learning, task-specific skill, (15 more...)

2503.212

Country: Asia > China (0.04)

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.34)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)